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ABSTRACT 

This  paper  surveys  the  overall  structure  and  design  of  some  of 
the  prototype  relational  database  systems  frequently  referenced 
in  the  literature,  and  then  examines  in  some  detail  some  inter- 
esting approaches  that  have  been  taken  in  implementing  various 
requisite  features  for  a  comprehensive  relational  database  sys- 
tem. 
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1.  INTRODUCTION 

A  series  of  papers  by  E.F.  Codd  between  1970  and  1971  (1,  2,  3. 
k)   provided  a  great  impetus  to  the  development  of  relational 
database  systems  and  associated  research  activities.   A  number 
of  prototype  systems  were  implemented  to  demonstrate  the  feasi- 
bility for  supporting  high  level,  non-procedural,  set-oriented 
data  sublanguages.   At  the  same  time  a  number  of  systems  were 
developed  to  serve  as  database  access  and  storage  subsystems 
for  implementing  high  level  relational  data  sublanguages.   More 
recently  efforts  have  been  directed  toward  implementing  compre- 
hensive relational  database  systems  that  incorporate  solutions 
to  various  specific  problems  that  have  been  identified.   Table 
1.1  lists  a  number  of  prototype  relational  systems  that  are  fre- 
quently referenced  in  the  literature. 

A  comprehensive  relational  database  system  must  not  only  provide 
interfaces  for  high  level,  non-procedural  data  sublanguages  for 
both  the  application  programmers  and  remote  terminal  users,  but 
also  support  various  users'  views  of  the  stored  database,  enforce 
data  integrity  and  authorization  constraints,  control  concurrent 
updates  of  a  shared  database,  and  provide  transaction  and  system 
recovery.   Efficient  data  access  strategies  and  storage  struc- 
tures and  query  processing  techniques  must  also  be  employed  to 
meet  the  response-time  requirements  of  on-line  terminal  users. 
Also  a  good  report  generator  should  be  provided  to  display  the 
results  of  a  query  in  a  highly  stylized  format. 


Readers  wishing  for  a  tutorial  on  relational  database  are  refer- 
red to  Date  (6)  and  Chamberlin  (7). 

Table  1.1   Some  Prototype  Relational  Database  Systems 

MADAM  (MacAIMS  Data  Management  System)  —  Project  MAC,  MIT 

RDM3  (Relational  Data  Management  System)  --  MIT 

DAM AS  (Data  Management  System)  —  MIT 

PDMS  (Relational  Data  Management  System)  — 

General  Motors  Research  Lab.,  Warren,  Mich. 

IS/l  (Information  System  1)  — 

IBM  UK  Scientific  Centre,  Peterlee,  England 

PRTV  (Peterlee  Relational  Test  Vehicle)  -- 

IBM  UK  Scientific  Centre,  Peterlee,  England 

RM  (Relational  Memory)  — 

IBM  Cambridge  Scientific  Center,  Cambridge,  Mass. 

XRM  (Extended  Relational  Memory)  — 

IBM  Cambridge  Scientific  Center,  Cambridge,  Mass. 

QUERY -BY -EXAMPLE  --  IBM  Yorktown  Heights  Research  Lab.,  N.Y. 

SEQUEL  --  IBM  San  Jose  Research  Lab.,  Calif. 

GMI3  (Generalized  Management  Information  System)  — 

IBM  Cambridge  Scientific  Center,  Cambridge,  Mass. 
MIT  Sloan  School  of  Management,  Cambridge,  Mass. 

3QUIRAL  (Smart  Query  Interface  for  Relational  Algebra)  — 
University  of  Utah 

RENDEZVOUS  --  IBM  San  Jose  Research  Lab.,  Calif. 

ZETA/TORUS  (Toronto  Understanding  System)  -- 
University  of  Toronto 

INGRES  (Interactive  Graphics  and  Retrieval  System)  — 
University  of  California  at  Berkeley 

SYSTEM  R  --  IBM  San  Jose  Research  Lab.,  Calif. 


^his  paper  first  describes  the  overall  architecture  and  design 
of  some  of  the  prototype  systems,  and  then  examines  some  of  the 
approaches  that  have  been  taken  in  implementing  the  various  re- 
quisite features  for  a  comprehensive  relational  system. 
The  illustrative  examples  used  in  this  paper  are  based  on  the 
following  database  of  employees  and  their  departments i 

EMP  (NAME,  DNO,  JOB,  SAL,  AGE,  MGR)  »  primary  key  is  NAME 
DEPT  (DNO,  LOC,  NEMPS)  ;  primary  key  is  DNO 

Each  EMP  tuple  contains  the  employee  name,  department  number, 
job  name,  salary,  age,  and  manager's  name.   Each  DEPT  tuple 
contains  the  department  number,  location  (floor),  and  the  num- 
ber of  employees  in  the  department. 

Some  of  the  relational  database  terms  used  in  this  paper  are 
listed  in  Table  1.2  along  with  their  equivalents  found  in  the 
literature. 

Table  1.2  Some  Relational  Database  Terms 

base  relation  --  schema,  data  model,  table,  stored  relation 

view  —  subschema,  external  data  model,  derived  relation, 

implied  relation,  virtual  relation,  defined  relation 

column,  field  --  attribute,  domain,  component 

tuple  --  row,  entry 

range  variable  --  tuple  variable,  entry  variable 

selection  —  restriction 

projection 

join  --  semi  join,  natural  join,  composition 

statistical  or  aggregate  function  —  libray  function,  built-in 

function 


2.  SYSTEM  ARCHITECTURE  AND  DESIGN 

This  section  describes  the  overall  architecture  and  design  of 
a  number  of  prototype  relational  systems.   Systems  treated  in  de- 
tail include  MADAM,  IS/l,  PRTV,  RM,  XRM,  ZETA,  INGRES,  and  SYS- 
TEM R.   Other  systems  that  are  mentioned  in  Table  1.1  are  only 
briefly  described  below. 

General  'Motor's  RDMS  (8)  supports  a  data  sublanguage  based  on 
the  relational  algebra.   It  has  been  used  in  several  applica- 
tions for  engineering  and  commercial  data  analysis.   MIT's  RDMS 
(9)  is  an  upgraded  version  of  MADAM,  and  is  widely  in  use  with- 
in the  administrative  departments  at  MIT.   MIT's  DAMA3  (10,  11) 
and  Utah's  SQUIRAL  (12)  have  made  interesting  contributions  to 
the  area  of  query  optimization  (see  section  k) .   IBM's  SEQUEL 
(13,  I**-,  15)  and  QUERY-BY-EXAMPLE  (16,  1?)  implement  two  of  the 
best  developed  high  level,  non-procedural  data  sublanguages, 
SEQUEL  (18)  and  QUERY -BY -EXAMPLE  (19),  respectively.   Both  sys- 
tems use  XRM  as  data  access  and  storage  subsystem.   GMI3  (20), 
which  is  being  used  at  MIT  for  modeling  New  England  energy  re- 
sources, was  developed  by  extending  the  SEQUEL  prototype  to  ac- 
commodate a  multi-user  environment.   RENDEZVOUS  (21)  is  a  natu- 
ral language  understanding  system  that  supports  a  relational 
view  of  data. 

2.1  MADAM  (MacAIMS  Data  Management  System)  (22,  23) 

MADAM,  developed  as  a  part  of  Project  MacAIMS  at  MIT's  Project 
MAC,  is  generally  acknowledged  to  be  the  first  system  that  sup- 


ported  an  n-ary  relational  model  of  data  and  a  set-oriented  data 
sublanguage  based  on  the  relational  algebra.   MADAM  was  imple- 
mented on  MULTIC3  in  PL/l ,  the  language  in  which  MULTIC3  itself 
was  written.   Fig.  2.1.1  shows  the  overall  structure  of  MADAM. 
The  user-oriented  subsystem  consists  of  a  query  language  inter- 
preter and  display  handler.   The  division  of  the  system  into  a 
set  of  procedure  and  data  segments  concerned  with  sets  of  data 
elements  (e.g.,  name,  salary,  job,  department  number)  and  another 
set  of  procedure  and  data  segments  concerned  with  sets  of  rela- 
tions (e.g.,  EMP,  DEPT)  is  the  consequence  of  the  view  taken  by 
the  designers  of  MADAM  that  information  one  may  store  in  a  data- 
base consists  of  sets  of  data  elements  and  sets  of  relations 
among  the  data  elements. 

Associated  with  each  type  of  data  elements  are  a  data  segment 
and  a  data  element  module.   Associated  with  each  possible  data 
structure  for  storing  a  relation  (e.g.,  tree,  linked  list,  array) 
are  a  relation  data  segment  and  a  relation  strategy  module. 
Fig.  2.1.1  is  more  fully  explained  in  section  3,1. 

2.2  Peterlee  IS/l   (24) 

IS/l  was  the  first  relational  prototype  implemented  by  IBM. 
This  system,  developed  at  the  IBM  UK  Scientific  Centre,  Peterlee, 
England,  supports  a  query  language  based  on  the  relational  alge- 
bra and  was  implemented  in  PL/l.   Fig.  2.2.1  illustrates  the 
architecture  of  the  system. 


In  the  syntax  section,  a  user  query  is  translated  into  a  string 
of  intermediate  codes.   This  string  is  a  procedure  call  and  asso- 
ciated arguments. 

The  procedure  section  consists  of  a  set  of  PL/l  procedures  that 
support  the  set  of  user  commands.   A  simple  interpreter  is  pro- 
vided to  serve  as  the  interface  between  the  syntax  section  and 
the  procedure  section.   The  interpreter  isolates  the  arguments 
from  each  string  of  intermediate  codes,  pushes  them  onto  an  arg- 
ument stack,  and  passes  control  to  the  procedure  being  called. 
The  database  facilities  section  consists  of  a  set  of  PL/l  pro- 
cedures that  access  and  store  views  and  base  relations. 
IS/l  was  designed  to  allow  the  users  to  easily  extend  the  func- 
tional capabilities  of  the  base  system  by  writing  their  own  PL/l 
procedures  and  linking  them  to  the  system.   The  syntax  section 
is  extended  as  a  side  effect  of  the  procedure  section  extension. 
IS/l  supports  user  views  on  the  base  relations  by  either  storing 
their  definitions  in  strings  of  intermediate  codes  (implicit 
views)  or  by  maintaining  the  materialized  sets  of  tuples  for  the 
views  as  separate  base  relations  (explicit  views).   This  last 
point  is  elaborated  upon  in  section  5.1. 

2.3  PRTV  (Peterlee  Relational  Test  Vehicle)   (25) 

PRTV  was  developed  at  the  IBM  UK  Scientific  Centre,  Peterlee, 
England,  based  on  their  experience  with  IS/l. 
This  system  was  implemented  in  two  major  sections  called  the 
top  end  and  the  bottom  end. 
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The  top  end  was  implemented  primarily  in  the  string  processing 
language  MP/3,  and  partially  in  PL/l .   This  section  of  the  sys- 
tem parses  the  user  language,  processes  and  maintains  the  sys- 
tem directories,  performs  the  mapping  between  relations  and 
storage  structures,  links  user-provided  PL/l  procedures,  and 
performs  query  optimization.   The  designers  of  I3/l  and  PRTV 
feel  very  strongly  that  although  data  sublanguages  based  on 
the  relational  calculus  or  relational  algebra  are  "relational- 
ly  complete  (4),"  they  are  by  no  means  "functionally  complete." 
Thus  13/1  and  PRTV  allow  the  users  to  extend  the  system  to  suit 
their  application  requirements  that  system  designers  may  not 
foresee.   The  query  optimization  techniques  implemented  for 
PRTV  are  due  to  the  works  of  Hall  (26),  and  are  described  in 
some  detail  in  section  4.1. 

The  bottom  end  was  implemented  in  a  mixture  of  PL/l  and  IBM  370 
assembler  language.   This  section  consists  of  a  set  of  proce- 
dures that  perform  all  storage  and  relational  operations  on  the 
database. 

The  user  language  PRTV  supports  is  based  on  the  relational  alge- 
bra.  However,  this  query  language  requires  a  sufficient  know- 
ledge of  algebra  on  the  part  of  the  users,  and  thus  restricts 
the  usability  of  PRTV  as  a  stand-alone  system.   In  fact,  PRTV 
was  used  as  a  low  level  subsystem  for  the  Urban  Management  Sys- 
tem which  created  a  database  of  60  megabytes  with  relations  of 
more  than  60,000  tuples  and  degrees  as  high  as  200.   This  data- 
base is  thought  to  be  the  largest  relational  database  to  have 
had  regular  use  (one  year). 


Fig.  2.1.1   Architecture  of  MADAM 
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2.4  RM  (Relational  Memory) 

XRM  (Extended  Relational  Memory)   (2?»  28,  15) 

Both  RM  and  XRM  were  developed  at  the  IBM  Cambridge  Scientific 
Center  for  use  as  low-level,  procedural  relational  interfaces 
for  host  language  systems,  or  as  intermediate  interfaces  in  im- 
plementing high-level,  non-procedural  relational  languages. 
RM  allows  variable  length  byte  strings  to  be  stored  and  accessed 
by  numerical  identifiers  in  a  set  of  pages  called  the  "entity 
space."  Binary  relations  whose  data  elements  are  either  inte- 
gers or  numerical  identifiers  for  the  byte  strings  stored  in 
the  entity  space  are  stored  and  operated  on  in  another  set  of 
pages  called  the  "relation  space."  Note  that  RM  supports  only 
binary  relations.   Binary  relations  are  logically  sufficient, 
for  relations  of  higher  degrees  can  be  straightforwardly  decom- 
posed into  sets  of  semantically  equivalent  binary  relations* 
Binary  relations,  however,  cannot  adequately  support  large, 
formatted  databases  (chapt.  11  of  6). 

XRM  was  developed  by  extending  RM  to  support  n-ary  relations, 
n-ary  tuples  are  stored  in  the  RM  entity  space,  and  inversions 
that  provide  efficient  associative  access  to  these  n-ary  tuples 
are  stored  in  the  RM  relation  space.   A  detailed  discussion  of 
XRM  is  given  in  section  3.3* 

2.5  ZET A/TORUS   (29,  30) 

ZETA  is  a  relational  prototype  under  implementation  at  the  Uni- 
versity of  Toronto.   Three  principal  levels  constitute  this  sys- 
tem, as  shown  in  Fig.  2.5.1, 
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The  lew-level,  called  MINIZ,  implements  elementary  commands  that 
directly  eperate  on  single  relations.  These  include  commands  te 
create  and  destroy  relations,  insert  and  delete  tuples,  retrieve 
and  update  columns  within  a  tuple,  and  mark  selected  tuples  of 
a  relation.  The  most  interesting  command  at  this  level  is  the 
mark  command.  A  mark  is  a  unary  relation  (i.e.,  a  list)  of  tuple 
identifiers  of  those  tuples  of  a  relation  satisfying  a  selection 
operation.  This  facility  is  employed  in  the  intermediate  level 
for  the  support  of  views  (see  section  5*2)  and  for  the  implemen- 
tation of  queries  involving  two  or  more  relations. 
The  intermediate  level,  consisting  primarily  of  the  EXECUTOR 
module,  performs  the  translation  of  the  intermediate  form  of 
the  user  commands  into  a  sequence  of  lew  level  commands.  The 
EXECUTOR  consists  of  an  interpreter,  a  set  of  "schema"  proce- 
dures, and  a  set  of  utilities.  The  interpreter  breaks  down  the 
intermediate  form  of  a  user  command  into  a  sequence  of  utility 
operations.  For  each  lew  level  command,  a  corresponding  utility 
is  provided  at  this  level.  The  interpreter  invokes  the  schema 
procedures  whenever  data  pertaining  to  user  views  or  base  rela- 
tions are  to  be  retrieved  or  accessed. 

The  high  level  ef  ZETA  consists  of  a  host-programming  language 
system  (HLS)  and  a  self-contained  language  system  (SLS).  The 
HLS  supports  a  SEQUEL-like  data  sublanguage  embedded  in  Pl/l . 
The  SLS  provides  a  syntax- table-driven  compiler-compiler  to 
tailor  a  SEQUEL-like  stand-alone  query  language  for  particular 
applications. 
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The  low  level  was  used  as  a  back  end  database  system  for  a  nat- 
ural language  understanding  system  called  TORUS  (Toronto  Under- 
standing System). 

2.6  INGRES  (Interactive  Graphics  and  Retrieval  System)   (31) 

INGRES,  developed  at  the  University  of  California  at  Berkeley, 
is  implemented  on  top  of  the  UNIX  operating  system  in  C,  the 
language  in  which  UNIX  itself  is  written.   INGRES  supports  QUEL 
(Query  Language),  a  high  level  language  based  on  the  relational 
calculus  for  terminal  users;  EQUEL  (Embedded  QUEL),  QUEL  embed- 
ded in  C  for  application  programmers;  and  CUPID,  a  graphics- 
oriented  casual  user  language. 

INGRES  is  implemented  as  a  sequence  of  UNIX  processes  as  shown 
in  Fig.  2.6.1.   Communication  between  a  UNIX  process  and  the 
tree  of  subprocesses  it  spawns  is  accomplished  via  one-direction 
communication  links  called  "pipes."  Each  pipe  is  written  into 
by  one  process  and  read  by  another. 

Process  1  is  created  when  INGRES  is  invoked  from  the  UNIX.   It 
should  be  noted  that  INGRES  is  treated  as  a  user  process  under 
the  UNIX  operating  system  environment.   Process  1  is  the  inter- 
active terminal  monitor  which  maintains  a  workspace  for  a  user 
to  formulate  and  edit  a  collection  of  INGRES  commands.   The  con- 
tents of  the  user  workspace  are  passed  down  pipe  A  as  a  string 
of  ASCII  characters  for  execution  by  later  processes. 
Process  2  translates  the  input  query  passed  down  pipe  A  into  a 
parse  tree  using  a  lexical  analyzer  and  a  parser.   The  qualifi- 
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cation  portion  of  the  query  is  converted  to  an  equivalent  Boole- 
an expression  in  conjunctive  normal  form.   A  conjunctive  normal 
form  expression  is  amenable  to  the  query  modification  techniques 
that  INGRE3  employs  for  the  support  of  user  views  and  authoriza- 
tion and  integrity  control  (32,  22).   Query  modification  is  sim- 
ply the  merging  of  the  parse  tree  for  a  query  with  the  stored 
parse  trees  for  the  view  definitions,  and  authorization  and  inte- 
grity constraints.   Section  6.2  gives  an  example  of  the  query 
modification  technique  in  action.   Process  2  also  contains  rou- 
tines for  a  crude  concurrency  control.   It  is  noted  here  that 
views  and  selective  authorization  control  have  not  been  imple- 
mented, and  that  only  simple  integrity  assertions  involving 
single  range  variables  and  no  aggregate  functions  (e.g.,  average, 
sum,  count)  are  currently  enforced. 

Process  3  accepts  from  process  2  a  string  of  tokens  representing 
a  user  query,  and  performs  the  following  two  functions i 

1.  decomposition  of  a  query  involving  more  than  one  range 
variable  (multi-variable  query)  into  a  sequence  of  inter- 
actions involving  only  one  range  variable  (one-variable 
query).  For  details,  see  (3*0  and  section  4.2. 

2.  processing  of  a  one-variable  query. 

Retrieve  commands  are  completely  processed  in  process  3»   Update 
commands  (Replace,  Delete,  and  Append),  however,  are  converted 
to  appropriate  Retrieve  commands  to  simply  isolate  the  qualify- 
ing tuples.   These  tuples  are  then  spooled  onto  a  temporary  file 
which  is  passed  to  process  4  for  the  actual  update  operations. 
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The  reasons  for  this  deferred  update  are  explained  in  section 
k.2.      All  utility  commands  (e.g.,  create  and  destroy  base  rel- 
ations and  secondary  indices)  are  simply  passed  down  to  process 
4.   Note  in  Fig.  2.6.1  that  the  results  of  a  retrieve  command 
are  returned  in  a  stylized  format  directly  to  the  user  termi- 
nal. 

Process  k   is  a  collection  of  overlays  that  support  deferred  up- 
date and  all  utility  commands. 

Note  in  Fig.  2.6.1  that  all  commands  are  passed  to  the  right 
via  pipes  A,  B,  and  C,  whereas  all  error  messages  are  passed 
to  the  left  via  pipes  D,  E,  and  F. 

When  INGRES  is  invoked  from  an  EQUEL  program,  the  executable 
module  for  the  EQUEL  program  (produced  by  a  pre-compiler  and 
a  compiler)  replaces  the  interactive  terminal  and  process  1 
of  Fig.  2,6.1. 

The  size  of  each  INGRES  process  largely  explains  the  particu- 
lar process  structure  chosen  by  the  designers  of  INGRES. 

Table  2.6.1   INGRES  Space  Requirements  (in  bytes) 

Access  Methods  (AM)  —  UK 

EQUEL  (precompiler,  compiler,  AM)  —  4lK 

Process  1  —  10K 

Process  2  and  AM  —  5^K 

Process  3  and  AM  —  56K 

Process  4  (AM,  8  overlays)  —  171K 
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The  size  of  a  UNIX  process  is  limited  to  64K  bytes  on  a  PDP 
llAo,  and  to  128K  bytes  on  a  PDP  llA5  or  11/70. 
The  INGRES  designers  believe  that  INGRES  should  appear  as  a 
user  job  to  UNIX  so  that  INGRES  would  operate  on  a  standard 
UNIX.   Consequently,  INGRES  contains  no  facilities  for  mem- 
ory management  and  I/O?  UNIX  performs  these  functions.   This 
design  decision  should  be  contrasted  with  that  for  SYSTEM  R 
(see  below).  Readers  wishing  for  a  more  detailed  account  of 
the  UNIX  operating  system  should  consult  (35) • 

2.7  SYSTEM  R   (36) 

SYSTEM  R,  developed  at  the  IBM  San  Jose  Research  Laboratory 
as  a  vehicle  for  research  in  relational  database,  is  by  far 
the  most  sophisticated  and  comprehensive  prototype  system 
implemented.   This  system  provides  extensive  data  control  fa- 
cilities, including  selective  authorization,  integrity  assert- 
ions, transactions  as  units  of  consistency  (see  section  6.3), 
This  system  implements  much  of  the  results  of  the  recent  in- 
vestigations in  concurrency  control  (37  •  38,  39) •  including 
such  notions  as  the  levels  of  consistency,  granularity  of 
locks,  and  locking  at  the  logical  level  as  well  as  at  the 
physical  level  (see  section  7*2).  The  system  also  provides 
facilities  for  transaction  backout,  and  transaction  and  sys- 
tem recovery  (see  section  8.2).   SYSTEM  R,  rather  than  rely- 
ing on  the  automatic  paging  of  virtual  memory  in  VM/370,  pro- 
vides its  own  memory  management  and  I/O  facilities.   This  de- 
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cision  was  made  to  satisfy  the  implementation  requirements  for 
a  novel  recovery  scheme  (see  section  8.2). 

Fig.  2.7.1  shows  the  overall  structure  of  SYSTEM  R.   The  RDI 
(Relational  Data  Interface)  is  the  principal  user  interface, 
consisting  of  a  set  of  operators  for  data  definition  and  mani- 
pulation and  a  set  of  operators  for  database  locking  and  trans- 
action.  A  special  RDI  operator  called  SEQUEL  is  provided  to 
support  terminal  users  and  PL/l  host  language  system  users. 
Users  are  allowed  to  write  programs  on  top  of  the  RDI  for  the 
support  of  other  relational  interfaces  (e.g.,  QUERY  BY  EXAMPLE) 
or  hierarchical  or  network  interfaces. 

The  RD3  (Relational  Data  System)  is  the  subsystem  that  implements 
the  RDI  operators.   The  RD3  provides  authorization  and  integrity 
control  facilities,  supports  various  users'  views  of  the  data- 
base, maintains  the  catalogs  of  external  names,  and  performs  an 
access  path  optimization. 

The  RSI  (Relational  Storage  Interface)  is  the  internal  interface 
which  provides  operators  for  tuple-at-a-time  access  to  base  rel- 
ations, and  also  operators  for  data  definition,  data  recovery, 
and  transaction  management. 

The  RSS  (Relational  Storage  System)  is  a  complete  storage  subsys- 
tem that  supports  the  RSI  operators.   The  RSS  maintains  access 
paths  to  the  base  relations,  and  secondary  indices  on  selected 
columns  of  the  base  relations.   The  RSS  maintains  transaction 
consistency  by  performing  locking,  deadlock  detection,  and  back- 
out.   The  RSS  also  performs  memory  management  and  transaction 
and  system  recovery. 
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Fig.  2.5.1   Architecture  of  ZETA 
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3.  STORAGE  STRUCTURES  AND  ACCESS  STRATEGIES 

This  section  examines  the  storage  structures  and  access  strate- 
gies used  in  MADAM,  PRTV,  RM,  XRM,  INGRES,  and  SYSTEM  R. 

3.1  MADAM 

The  following  discussion  refers  to  Fig.  2.1.1. 

Each  type  of  data  elements  (e.g.,  name,  salary,  job,  column  name 
of  a  relation)  is  associated  with  a  different  DEM  (Data  Element 
Module)  and  a  different  DES  (Data  Element  Segment).   Furthermore, 
each  type  of  data  structure  for  storing  a  relation  (e.g.,  tree, 
list,  array)  is  associated  with  a  different  R3M  (Relation  Strate- 
gy Module)  and  a  different  RDS  (Relation  Data  Segment). 
A  DEM  converts  appropriate  type  of  data  elements  from  their  nat- 
ural form  to  their  standard  form  in  which  all  data  elements  are 
stored.   For  example,  the  standard  form  for  data  elements  of 
type  "date,"  "11/25/77"  or  "Nov.  25,  1977,"  may  be  "771125." 
Since  most  data  elements  are  either  character  strings  or  inte- 
gers, a  D3M  (Data  Strategy  Module)  for  character  strings  and  a 
D3M  for  integers  are  provided.   A  DSM  generates  a  unique  refer- 
ence number  for  each  data  element.   The  reference  number  for  an 
integer  is  the  integer  itself,  whereas  the  reference  number  for 
a  string  of  characters  is  determined  by  an  algorithm  which  pre- 
serves the  lexical  ordering  between  different  character  strings. 
The  fixed-length  (one  word)  reference  number  is  used  in  all  sub- 
sequent operations  on  the  data  element.   Since  the  reference 
numbers  preserve  the  lexical  ordering  between  the  data  elements 


19 


they  represent,  the  frequent  comparison  and  ranking  operations 
can  be  directly  performed  on  them.  Moreover,  these  operations 
can  be  performed  much  more  quickly  on  the  single-word  long  ref- 
erence numbers  than  on  the  variable-length  data  elements. 
Within  a  DES  (Data  Element  Segment)  all  data  elements  of  one 
type  are  stored  in  a  binary  search  tree. 

The  designers  of  MADAM  take  the  view  that  for  each  type  of  data 
structure  for  storing  a  relation  a  different  set  of  procedures 
should  be  provided  to  support  operations  on  the  relation.  Each 
RSM  is  designed  to  perform  all  database  operations  on  only  one 
type  of  data  structure.  A  potential  difficulty  of  this  approach 
arises  when  it  becomes  necessary  to  operate  on  two  relations 
stored  in  different  types  of  data  structure.  A  canonical  form 
storage  structure  for  relations  is  therefore  defined  as  the 
standard  two  dimensional  array  representation  of  a  relation. 
Each  RSM  is  allowed  to  accept  a  relation  stored  in  canonical 
form  and  produce  the  output  in  canonical  form.   Thus  when  it 
becomes  necessary  to  operate  on  two  relations  stored  in  differ- 
ent storage  structures,  one  of  the  relations  is  converted  to 
canonical  form  by  its  RSM.   The  RSM  for  the  other  relation  then 
performs  the  actual  operation  using  the  canonical  form  as  one 
of  its  inputs. 

All  data  elements  and  domain  names  of  a  relation  are  uniformly 
referred  to  by  their  reference  numbers. 
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3.2  PRTV 

PRTV  implements  various  techniques  to  increase  storage  space  uti- 
lization and  to  reduce  data  retrieval  time.   These  include  the 
use  of  a  standard  format,  sorting,  and  suppression  and  compres- 
sion of  data. 

PPTV  stores  all  relations  sorted  on  the  values  of  some  column. 
Sorting  makes  the  execution  of  operations  such  as  join  more  ef- 
ficient, and  allows  for  the  suppression  of  duplicate  leading 
fields. 

Data  suppression  is  performed  as  follows i   In  each  record,  lead- 
ing fields  are  suppressed  if  they  are  identical  to  the  corres- 
ponding fields  of  the  preceding  record;  and  the  remaining  fields 
are  stored  along  with  an  indication  of  the  number  of  fields  act- 
ually stored.   For  example,  the  records  shown  in  Fig.  3.2.1  are 
suppressed  and  stored  as  shown  in  Fig.  3*2.2. 

Fig.  3.2.1  Records  Before  Suppression 

12  2 

1  2  3 

1  1  3 

2  1  3 

Fig.  3.2.2  Records  of  Fig.  3.2.1  After  Suppression 

12  2 

1  3 
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PRTV  performs  data  compression  as  follows i  The  first  record  in 
each  block  of  records  is  stored  in  its  entirety;  in  each  of  the 
subsequent  records,  all  fields  that  have  not  been  suppressed 
are  Exclusive-Ored  with  the  corresponding  fields  of  the  first 
record i  only  the  non-zero  bytes  that  result  from  this  operation 
are  stored,  along  with  a  bit  map  that  indicates  the  positions  of 
these  bytes.   For  example,  the  records  of  Figs.  3.2.3a  and  3.2.3b 
are  Exclusive-Ored  to  yield  the  record  shown  in  Fig.  3.2.3c. 

Fig.  3.2.3  PRTV  Data  Compression 

a)  01011111  01011100  01100101  01100011 

b)  01011111  01110101  01100101  01000011 
XOR   00000000  00101001  00000000  00100000 

c)  01010000  00101001  00100000 
bit  map 

3.3  RM,  XRM 

RM  uses  one  set  of  pages  to  store  data  elements,  and  another  set 
of  pages  to  store  binary  relations.   A  data  element  is  referenced 
by  a  numeric  identifier  consisting  of  a  page  address  and  an  off- 
set within  the  fixed  header  area  in  the  page  (see  Fig.  3«3«1)« 
The  referenced  header  area  position  contains  the  actual  byte  ad- 
dress for  the  data  element.   A  binary  relation  identifier  also 
consists  of  a  page  address  and  an  offset  within  the  page  header 
area.   The  header  area  position  contains  the  byte  address  of  the 
first  tuple  of  the  relation.   RM  maintains  a  linked  list  of  all 
tuples  of  a  relation  in  increasing  order  of  the  left  column  values 
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of  the  tuples.   All  tuples  with  the  same  left  column  values,  in 
turn,  are  represented  in  a  linked  list  (linked  to  the  common 
left  column  value)  in  increasing  order  of  the  right  column  values 
(see  Fig.  3.3.2). 

XRM  stores  all  n-ary  tuples  of  a  relation  in  the  RM  entity  space 
(set  of  data  pages).  Each  field  of  an  n-ary  tuple  is  represent- 
ed as  a  fixed  length  integer,  which  is  the  actual  value  for  an 
integer  column,  and  which  is  the  numeric  identifier  for  a  string 
of  characters  stored  in  another  data  page  for  a  non-integer  col- 
umn. 

XRM  provides  a  class  relation  for  all  data  elements  belonging  to 
the  same  column.   This  is  implemented  via  a  binary  relation  in 
RM  relation  space  (set  of  relation  pages).   Each  tuple  of  this 
relation  consists  of  the  hashed  value  of  a  data  element  and  the 
numeric  identifier  for  the  data  element.  A  class  relation  is 
used  for  quick  access  from  the  value  of  a  data  element  to  its 
numeric  identifier. 

XRM  also  maintains  a  binary  relation  for  each  n-ary  relation 
so  that  a  particular  tuple  of  the  relation  may  be  directly  ac- 
cessed given  its  primary  key  value.   Each  tuple  of  this  binary 
relation  contains  the  hashed  value  of  the  key  and  the  identifi- 
er of  the  n-ary  tuple. 

Rapid  access  to  a  tuple  with  a  particular  value  in  a  particular 
field  is  also  supported  by  an  inversion  relation.   There  is  also 
a  master  relation  which  contains  information  about  all  other  rel- 
ations.  The  relation  identifier  of  a  relation  is  then  the  iden- 
tifier of  a  tuple  in  the  master  relation  which  describes  the  rel- 
ation. 
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Fig.  3.3.1   RM  Data  Page  Layout 
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3.4  INGRES 

INGRES  supports  five  storage  structures.  Four  of  these  are 
keyed,  that  is  the  storage  location  of  any  tuple  within  a  file 
that  contains  a  relation  is  determined  as  a  function  of  the 
value  of  the  tuple's  primary  key.  These  are  termed  hashed, 
ISAM,  compressed  hashed,  and  compressed  ISAM.  The  non-key 
structure  is  termed  a  heap,  and  stores  tuples  of  a  relation 
independently  of  their  key  values. 

The  non-key  structure  involves  a  low  overhead,  and  is  intended 
for  relations  of  small  cardinality,  or  temporary  relations  cre- 
ated as  intermediate  results  during  query  processing.   New 
tuples  are  simply  appended  to  the  end  of  a  file  in  the  order 
supplied,  and  their  byte  offsets  within  the  file  become  their 
tuple  identifiers. 

The  four  keyed  storage  structures  share  a  common  page  layout 
shown  in  Fig.  3,b,l,      A  tuple  identifier  consists  of  a  page  ad- 
dress and  a  line  number  in  the  fixed  line  table  (header  area 
in  RM/XRM  pages).   The  referenced  entry  in  the  line  table  con- 
tains the  byte  address  of  the  tuple.   The  "next  primary  page" 
pointers  chain  together  all  primary  pages.   A  relation  is  init- 
ially represented  as  a  linked  list  of  primary  pages  in  a  file. 
The  "next  overflow  page"  pointers  link  together  all  overflow 
pages  containing  tuples  that  do  not  fit  in  the  primary  pages. 
The  general  structure  of  a  keyed  file  is  illustrated  in  Fig. 
3.^.2.   Each  keyed  file,  consisting  of  primary  and  overflow 
pages,  is  dedicated  to  the  storage  of  a  single  relation.   This 
decision  not  to  cluster  tuples  from  different  relations  that 
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may  be  accessed  together  was  made  so  that  the  INGRES  access  meth- 
ods and  query  decomposition  procedures  (see  section  4.2)  would 
be  manageable.   The  decision  was  also  a  direct  consequence  of 
the  fact  that  INGRES  provides  no  memory  management  or  I/O  facili- 
ties of  its  own.   Logically  adjacent  pages  in  a  UNIX  file  are 
not  necessarily  physically  adjacent,  and  only  a  small  number  of 
tuples  from  different  relations  may  be  grouped  in  a  page  of  512 
bytes  that  UNIX  supports. 

In  a  hashed  file,  tuples  are  distributed  randomly  throughout  the 
file  according  to  the  values  of  their  primary  keys. 
In  an  ISAM  file,  tuples  of  a  relation  are  initially  sorted  to 
produce  an  ordering  on  a  particular  key.   The  heighest  key  values 
in  all  the  primary  pages  comprising  the  file  are  recorded  in  a 
multilevel  directory  in  several  pages  following  the  primary  pages 
within  the  file  itself.  This  directory  is  static,  that  is,  it 
does  not  change  during  subsequent  insertion,  deletion,  and  update 
of  tuples.   Tuples  in  a  primary  page  and  its  overflow  pages  are 
not  maintained  in  sort  order.  An  INGRES  ISAM  file  is  similar  to 
IBM's  ISAM.   A  good  discussion  on  ISAM  can  be  found  in  Chapter 
7-6  of  (40). 

An  ISAM  file  is  useful  in  situations  where  the  key  value  is  spe- 
cified as  falling  within  a  range  of  values,  whereas  a  hashed 
file  is  appropriate  when  access  is  conditioned  on  a  specific  key 
value. 

In  both  keyed  storage  structures  mentioned  above,  fixed  length 
tuples  are  stored.   INGRES  implements  a  compressed  hashed  file 
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and  a  compressed  ISAM  file.   These  schemes  use  a  hashed  file 
and  an  ISAM  file  in  conjunction  with  some  data  compression  tech- 
niques (41).   Currently  blanks  and  portions  of  a  tuple  which 
are  identical  to  the  preceding  tuple  are  suppressed.   The  com- 
pressed files  are  useful  when  an  increase  in  storage  utiliza- 
tion outweighs  the  overhead  involved  in  encoding  and  decoding 
the  variable  length  field  values  of  tuples. 

3.5  SYSTEM  R 

The  SYSTEM  R  storage  structure  is  illustrated  in  Fig.  3* 5.1. 
A  segment  is  an  address  space  in  VM/370  virtual  memory.  Each 
segment  consists  of  a  set  of  pages.   A  page  is  the  unit  of 
data  transfer  between  the  main  memory  and  secondary  storage 
devices.   Each  segment  may  contain  many  relations!  however, 
each  relation  is  entirely  contained  within  a  segment. 
Unlike  in  INGRES,  each  page  may  contain  tuples  from  different 
relations,  so  that  extra  page  accesses  may  be  avoided  when 
tuples  from  different  relations  are  accessed  together?  however, 
each  tuple  is  entirely  contained  within  a  page. 
Some  pages  are  reserved  for  the  storage  of  tuples  of  user  rel- 
ations (data  pages),  while  other  pages  are  intended  for  the 
storage  of  index  structures  and  system  catalogs.   Also  several 
types  of  segment  are  supported  to  satisfy  different  requirements 
with  appropriate  overheads.   For  example,  one  type  of  segment 
is  used  for  the  storage  of  shared  data.   Facilities  must  be  pro- 
vided for  concurrent  access,  transaction  backout,  and  segment 
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recovery  for  the  data  of  this  segment.   A  different  segment  type 
is  used  for  the  storage  of  temporary  relations  generated  as 
intermediate  results  of  a  query  processing.   The  overhead  incur- 
red for  the  support  of  a  segment  for  shared  data  is  not  necessary 
for  this  type  of  segment.   Another  type  of  segment  is  used  for 
maintaining  the  access  path  structures  or  internal  catalogs. 
Each  tuple  of  a  relation  is  stored  as  a  contiguous  sequence  of 
field  values  along  with  a  prefix.   The  prefix  contains  the  rela- 
tion identifier,  the  number  of  fields  in  the  tuple,  pointer 
fields  that  implement  the  links  (see  section  4.3),  and  the  num- 
ber of  pointer  fields.   As  in  RM/XRM  and  INGRES,  each  tuple  is 
associated  with  a  tuple  identifier  which  is  a  concatenation  of 
a  page  number  and  an  offset  within  a  header  area.   Tuples  may 
be  moved  within  a  page  to  allow  space  compaction,  and  their  new 
byte  addresses  entered  in  their  slots  in  the  fixed  header  area. 
Thus  the  tuple  identifiers  generated  before  the  space  compac- 
tion correctly  reference  the  corresponding  tuples  after  the 
space  compaction.   In  case  a  tuple  is  updated  to  a  longer  total 
value  and  space  is  no  longer  available  in  the  current  page,  it 
is  tagged  with  a  new  tuple  identifier  in  a  nearby  page  (see  Fig. 
3.5.1).   If  the  new  tuple  overflows  again,  the  original  tuple 
is  tagged  with  a  tuple  identifier  in  a  page  that  does  not  cause 
the  tuple  to  overflow.  A  tuple  access  via  its  tuple  identifier 
thus  involves  at  most  two  page  accesses. 

Each  tuple  identifier  is  generated  and  used  by  the  system  (RSS) 
to  refer  to  tuples  from  index  structures  and  to  maintain  pointer 
chains.   This  is  further  discussed  in  section  4.3. 
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k.    QUERY  PROCESSING  AND  OPTIMIZATION 

A  number  of  interesting  works  have  been  done  on  optimizing  the 
implementation  of  high  level  data  sublanguages  based  on  the  rela- 
tional calculus  or  relational  algebra.   In  (4-)  Codd  presents  an 
algorithm,  called  "Codd's  reduction  algorithm,"  for  translating 
a  relational  calculus  expression  into  a  sequence  of  relational 
algebra  operations.   Palermo  made  certain  improvements  in  the 
efficiency  of  the  reduction  algorithm  and  implemented  the  alge- 
bra operators  in  APL  (^2).   Smith  and  Chang  (12),  Hall  (26),  and 
Pecherer  (4-3)  have  explored  techniques  for  translating  an  alge- 
braic query  expression  into  an  equivalent,  but  more  efficient, 
form.   Astrahan  and  Chamberlin  (15)  have  proposed  techniques  for 
decomposing  the  block-structured  SEQUEL  statements.   Rothnie  (10, 
11)  and  Wong  and  Youssefi  (3*0  have  implemented  iterative  decom- 
position techniques  for  reducing  a  calculus-based  query  involving 
multiple  range  variables  into  a  sequence  of  queries  involving 
only  single  range  variables.   Senko,  et.  al.  (44)  and  Tsichritzis 
(4-5)  have  proposed  schemes  in  which  a  non-procedural  expression 
is  compiled  into  a  set  of  procedures  in  a  hierarchical  or  network 
oriented  language.   In  (46)  Gotlieb  presents  and  compares  a  num- 
ber of  algorithms  for  implementing  a  join.   In  (47)  Blasgen  and 
Eswaran  present  an  evaluation  of  several  techniques  for  imple- 
menting join,  projection,  and  selection  operations  on  the  basis 
of  their  costs  in  secondary  storage  accesses. 

The  optimizing  techniques  implemented  for  PRTV,  INGRES,  and  SYS- 
TEM R  are  examined  in  this  section. 
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k.l    PRTV 

PRTV  generates  a  query  tree  for  a  sequence  of  relational  algebra 
operations  specified  by  the  user.   The  PRTV  optimizer  then  reorg- 
anizes the  query  tree  according  to  the  following  principles i 

1.  Selections  are  moved  as  far  down  the  tree  (toward  the  leaves) 

as  possible  for  the  earliest  possible  execution. 
.?.  Projections  not  requiring  the  results  to  be  sorted  are 
also  moved  as  far  down  the  tree  as  possible. 

3.  Projections  requiring  sorted  results,  however,  are  moved 
as  far  up  the  tree  as  possible  for  the  latest  possible 
execution. 

4.  Projections  of  a  projection  on  the  same  relation  are 
merged  into  a  single  projection. 

5.  Expressions  involving  several  set  operators  are  rearranged 
according  to  such  standard  rules  as  commutativity  and  dis- 
tributivity . 

6.  Common  subtrees  within  a  query  tree  may  be  realized  once 
into  an  explicit  relation  so  that  duplication  of  the  oper- 
ations may  be  avoided.   Note  here  that  the  optimizer  esti- 
mates the  cost  of  creating  an  explicit  relation  and  the 
cost  of  duplicating  the  operations,  and  chooses  the  cheaper 
alternative. 

7.  The  optimizer  also  selects  among  alternative  implementations 
of  the  relational  operators. 


31 


k.2    INGRES 

Query  processing  in  INGRES  is  done  in  process  3  via  the  OVQP 
(one-variable  query  processor)  and  DECOMP  (decomposition)  rou- 
tines.  The  OVQP  is  invoked  to  completely  process  a  query  which 
involves  a  single  range  variable.   The  DECOMP  is  called  to  re- 
duce a  query  involving  more  than  one  range  variable  into  a  se- 
quence of  one-variable  queries.   The  DECOMP  makes  iterative 
calls  to  the  OVQP  to  process  the  intermediate  one-variable  que- 
ries, and  merges  the  results  of  these  calls.   The  OVQP  and  DE- 
COMP are  discussed  in  detail  below. 

OVQP 

This  program  consists  of  a  module  called  STRATEGY  which  selects 
an  optimal  access  strategy  for  retrieving  tuples  from  a  single 
relation,  and  a  module  called  SCAN  which  evaluates  each  retrieved 
tuple  against  the  predicates  of  the  query  and  creates  the  output 
list  of  values  from  the  qualifying  tuples. 

DECOMP 

Three  techniques  are  used  to  reduce  a  multi-variable  query  to 

a  query  involving  one  less  range  variables!  tuple  substitution, 

one  variable  detachment,  and  reformatting. 

1)  tuple  substitution 

One  range  variable  in  the  query  is  selected,  and  the  relation 

over  which  the  variable  ranges  is  scanned  one  tuple  at  a  time. 

All  references  to  the  selected  variable  in  the  query  are  replaced 

by  appropriate  field  values  from  the  retrieved  tuple.  The  origi- 
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nal  query  now  has  one  less  variable.   Note  that  the  variable 

which  ranges  over  a  relation  with  the  smallest  cardinality  is 

selected  for  substitution. 

As  an  example,  consider  how  the  following  2-variable  query  is 

reduced  by  tuple  substitution. 

(Find  the  names  of  employees  who  earn  more  than  their 
managers) 

RANGE  OF  E,  M  13  EMP 
RETRIEVE  (E.NAME) 
WHERE  E.SAL >  M. SAL   AND 
E.MGR  =  M.NAME 

The  two  range  variables,  E  and  M,  in  this  QUEL  statement  both 

range  over  the  EMP  relation.   Suppose  that  E  is  selected  for 

substitution,  and  that  a  tuple  (Jones,  2,  salesman,  18K,  30, 

Smith)  of  EMP  is  retrieved.   Substitution  of  Jones,  18K,  and 

Smith  for  E.NAME,  E.3AL,  and  E.MGR,  respectively,  yields  the 

following  one  variable  query « 

RANGE  OF  M  IS  EMP 
RETRIEVE  (Jones) 
WHERE  18K>  M.SAL   AND 
Smith  =  M.NAME 

2)  one  variable  detachment 

Recall  that  the  qualification  portion  of  a  QUEL  statement  is 
converted  to  an  equivalent  conjunctive  normal  form  expression 
(see  section  2.6).   Let  us  define  a  simple  clause  to  be  a  con- 
junct of  the  form  (variable. column  comparator  constant). 
For  example,  E.SAL  =  16K  is  a  simple  clause,  but  E.MGR  =  M.NAME 
is  not. 
One  variable  detachment  is  done  in  two  steps  as  follows t 
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First,  for  each  range  variable  V  in  a  simple  clause  C,  create  a 
temporary  relation  T  by  first  performing  a  selection  over  R,  the 
relation  over  which  V  ranges,  with  predicate  C,  and  then  perform- 
ing a  projection  over  all  columns  of  R  that  are  involved  in  the 
target  list  and  all  remaining  conjuncts  of  the  qualification. 
For  example,  consider  the  following  QUEL  statement! 

(Find  the  names  of  employees  over  50  years  of  age  who  work 
on  the  third  floor  and  earn  more  than  their  managers) 

RANGE  OF  E,  M  IS  EMP 
RANGE  OF  D  IS  DEPT 
RETRIEVE  (E.NAME) 
WHERE   E.SAL  >  M.SAL    AND 

S.MGR  =  M.NAME   AND 

E.DNO  =  D.DNO    AND 

D.LOC  =  3        AND 

E.AGE  >  50 

Temporary  relations  Tl  and  T2  are  created  for  the  two  simple 

clauses  in  the  query,  D.LOC  ■  3,  and  E.AGE  >  50.   Tl  is  defined 

over  DNO,  while  T2  is  defined  over  NAME,  SAL,  MGR,  and  DNO. 

RANGE  OF  D  IS  DEPT 
RETRIEVE  INTO  Tl  (D.DNO) 
WHERE   D.LOC  =  3 

RANGE  OF  E  IS  EMP 

RETRIEVE  INTO  T2  (E.NAME,  E.SAL,  E.MGR,  E.DNO) 

WHERE   E.AGE  >  50 

Each  temporary  relation  thus  generated  is  stored  in  a  hashed 
file  (see  section  3.*0«   The  hash  key  is  chosen  as  follows i 
Non-simple  clauses  of  the  form  V.D  =  VI. Dl  in  the  original  que- 
ry are  collected,  where  V  and  VI  are  range  variables,  D  and  Dl 
are  column  names,  and  V  ranges  over  temporary  relation  T.   If 
this  collection  is  empty,  T  is  stored  in  a  heap;  otherwise,  a 
D  is  arbitrarily  selected. 
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Consider  our  example  query  again.   For  variable  D,  which  may  now 

range  over  Tl,  our  collection  contains  only  one  clause,  D.DNO  = 

E.DNO.   Therefore,  Tl  is  stored  hashed  on  DNO.   For  variable  E, 

which  may  now  range  over  T2,  our  collection  consists  of  two 

clauses,  E.MGR  =  M.NAME,  and  E.DN0=D.DN0.   T2  may  then  be  hashed 

on  either  MGR  or  DNO.   T2  may  have  to  be  restructured  with  a 

different  hash  key,  if  this  arbitrary  selection  of  a  key  turns 

out  to  be  inappropriate  (see  reformatting  below). 

Second,  all  simple  clauses  are  removed  from  the  original  query, 

and  all  the  relations  over  which  the  variables  involved  in  the 

simple  clauses  are  replaced  by  the  newly  generated  temporary 

relations.   Remember  these  temporary  relations  are  horizontal 

and  vertical  subsets  of  the  original  relations.   The  result  of 

one  variable  detachment  is  shown  below. 

RANGE  CF  D  IS  Tl 

RANGE  OF  E  13  T2 

RANGE  OF  M  13  EMP 

RETRIEVE  (E.NAME) 

WHERE   E.SAL  >  M.SAL   AND 

E.MGR  -  M.NAME   AND 

E.DNO  =  D.DNO 

3)  reformatting 

Consider  as  an  example  the  3-variable  query  above.   Suppose  that 
Tl  is  hashed  on  DNO,  T2  is  hashed  on  MGR,  and  Tl  has  the  smallest 
cardinality.   Then  range  variable  D  is  selected  for  tuple  substi- 
tution.  When  tuple  substitution  is  made,  E.DNO  =  D.DNO  becomes 
E.DNC  =  constant,  and  the  resulting  simple  clause  can  now  be  re- 
moved by  one  variable  detachment.   Since  one  variable  detachment 
must  be  performed  for  each  tuple  of  Tl  over  which  D  ranges,  re- 
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formatting  T2,  over  which  E  ranges,  to  be  hashed  on  DNO  may  sig- 
nificantly improve  its  processing.   Therefore,  if  T2  was  hashed 
on  MGR  (see  one  variable  detachment  above),  it  is  rehashed  on 
DNC  here. 

Before  concluding  this  subsection,  a  few  reasons  for  the  deferred 
update  mentioned  in  section  2.6  are  explained.  The  examples  used 
in  (31)  are  very  illuminating,  and  are  largely  transplanted  here. 

1 )  secondary  index  anomaly 

Suppose  each  employee  earning  under  15K   is  to  be  given  a  10  per- 
cent raise.   Suppose  also  that  a  secondary  index  on  SAL  is  main- 
tained as  shown  in  Fig.  4.2.1a. 

Fig.  4.2.1   Secondary  Index  Anomaly 

a)   SAL  TIP       b)   SAL   TIP        c)   SAL   TIP 

12K  Tl  13K   T2  13. 2K  Tl 

13K   T2  13. 2K  Tl  14. 3K  T2 

Suppose  update  is  immediate.   The  OVQP  makes  use  of  this  index j 
first  it  updates  the  tuple  identified  by  Tl,  and  then  updates 
the  index  entry  (12K  Tl )  to  (13.2K  Tl )  to  reflect  this.   After 
the  next  qualifying  tuple,  T2,  is  processed,  the  OVQP  encounters 
Tl  again.   In  this  way  a  qualifying  tuple  may  be  updated  an  in- 
definite number  of  times!   See  Figs.  4.2.1b  and  4.2.1c. 

2)  base  relation  anomaly 

Suppose  each  employee  earning  more  than  his  or  her  manager  is  to 
take  a  10  percent  pay  cut.  Consider  the  database  fragment  shown 
in  Fig.  4.2.2. 
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Fig.  4.2.2  Base  Relation  Anomaly 

NAME       SAL       MGR 

Smith      10K       Jones 
Jones      8K       Killer 
Brown      9.5K      Smith 

Suppose  update  is  immediate  here.   The  only  employee  qualifying 
for  a  pay  cut  is  Smith.   Smith  now  earns  9K.   Note  that  Brown 
now  earns  more  than  Smith,  and  as  a  consequence  of  the  immediate 
update  of  Smith's  tuple,  Erown  has  to  take  a  pay  cut,  tool 

It  has  been  pointed  out  that  deferred  update  incurs  a  heavy  over- 
head, and  an  optional  immediate  update  (in  process  3)  is  now  be- 
ing considered. 

4.3   SYSTEM  R 

A  brief  discussion  on  the  access  paths  that  SYSTEM  R  supports  is 
in  order  before  the  SYSTEM  R  optimizer  is  examined. 
SYSTEM  R  supports  three  access  paths  for  tuples  of  relations; 
images,  links,  and  relation  scans.   An  image  is  conceptually  a 
binary  relation  consisting  of  values  from  one  or  more  fields, 
called  sort  fields,  of  a  relation  in  ascending  or  descending 
order,  and  their  corresponding  tuple  identifiers.  At  most  one 
clustering  image  for  a  relation  is  supported,  in  which  tuples 
whose  sort  values  are  "close"  are  stored  physically  "close"  to 
one  another.   Access  to  any  tuple  of  a  relation  with  an  image 
support  is  accomplished  by  keying  on  the  sort  field  value. 
Access  to  a  sequence  of  tuples  within  a  range  of  sort  field 
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values  can  also  be  done  efficiently  on  a  relation  with  an  image 
support.   Each  image  on  a  relation  resides  in  one  or  more  pages 
within  the  segment  in  which  the  relation  is  stored.   The  pages 
in  which  an  image  resides  form  the  nodes  of  a  balanced  B-tree 
(for  a  comprehensive  discussion  on  B-trees,  see  Chapter  k.5   of 
k8      ).   A  link,  on  the  other  hand,  is  a  doubly  linked  list 
that  connects  tuples  in  one  relation  (a  unary  link),  or  a  tuple 
of  one  relation  with  a  sequence  of  tuples  of  another  relation 
(a  binary  link).   A  unary  link  may  be  used  to  maintain  for  a  rel- 
ation a  tuple  ordering  that  is  not  ascending  or  descending  on 
any  sort  fields.   It  may  also  be  used  to  connect  all  tuples  of 
a  relation  for  fast  access  without  the  time  penalty  of  an  en- 
tire relation  scan  (see  below).   A  binary  link  may  significantly 
enhance  the  processing  time  for  a  join.   For  example,  if  a  bin- 
ary link  is  maintained  between  each  DEPT  tuple  and  all  EMP 
tuples  in  our  example  database  based  on  value  matches  in  DNO 
columns  of  both  relations,  to  respond  to  a  query  "Find  the  names 
and  salaries  of  all  employees  in  department  5."  an  appropriate 
DEPT  tuple  is  first  found  and  then  its  binary  link  is  followed 
to  retrieve  all  required  EMP  tuples.   Finally,  a  relation  scan 
is  simply  a  sequential  search  through  all  data  pages  of  a  seg- 
ment that  contains  the  relation.   Since  tuples  from  different 
relations  may  be  stored  in  one  page  (see  section  3»5)i  the  pre- 
fix of  each  tuple  retrieved  must  be  examined  to  determine  if 
the  tuple  belongs  to  the  required  relation. 
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Now  we  are  ready  to  understand  the  SYSTEM  R  optimizer. 
The  objective  of  this  optimizer  is  to  minimize  the  expected  num- 
ber of  page  fetches  from  secondary  storage.   The  optimizer  selects 
an  optimal  method  of  executing  a  query  as  follows «   first,  it 
classifies  the  query  into  one  of  several  types  according  to  the 
presence  of  some  particular  language  features  (e.g.,  join,  group 
by);  second,  it  examines  the  system  catalogs  to  determine  the 
set  of  access  paths  available  for  the  relations  involved;  third, 
it  determines  the  set  of  "reasonable"  methods  for  executing  the 
query  from  the  set  of  access  paths  found;  and  fourth,  it  evalu- 
ates the  expected  cost  formula  for  each  "reasonable"  method, 
and  selects  the  minimum  cost  method.   The  parameters  of  the  cost 
formula  include  the  relation  cardinality  R,  the  number  of  data 
pages  occupied  by  the  relation  D,  the  average  number  of  tuples 
per  data  page  T  (=  R/D),  the  image  cardinality  I  (the  number 
of  distinct  sort  field  values  in  a  given  image),  and  the  esti- 
mated CPU  cost  for  tuple  comparisons  H. 

Some  of  the  "reasonable"  methods  for  executing  a  simple  query 
involving  a  single  relation  are  illustrated  below  along  with 
their  expected  cost.   To  be  concrete,  the  following  SEQUEL  que- 
ry for  finding  the  names  and  salaries  of  all  programmers  over 
25  years  of  age  is  considered. 

SELECT   MAO,  SAL 
FRCP.'!    BMP 

WHERE   JOB  =  programmer  AND 
AGE  >  25 

1)  use  of  a  clustering  image  that  matches  a  predicate  of  the 

form  (column  =  constant).   An  image  "matches"  a  predicate 
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if  the  sort  field  of  the  image  matches  the  column  name  of  the 
predicate.   If  EMP  has  the  clustering  image  on  JOB,  this  op- 
tion is  applicable.   The  expected  cost  to  retrieve  all  result 
tuples  is  R/(T  x  I)  page  accesses. 

2)  use  of  a  clustering  image  that  matches  a  predicate  of  the  form 
(column  comparator  constant).   If  EMP  has  the  clustering  image 
on  AGE,  this  option  is  applicable.   Assuming  that  half  the 
tuples  in  the  relation  satisfy  the  predicate,  the  expected 
cost  is  R/(2  x  T) . 

3)  use  of  an  image  (non-clustering)  that  matches  a  predicate  of 
the  form  (column  =  constant).   Since  each  tuple  requires  a 
page  access,  the  expected  cost  is  R/l. 

*0  use  of  an  image  that  matches  a  predicate  of  the  form  (column 
comparator  constant).   The  expected  cost  is  R/2. 

5)  use  of  a  clustering  image  that  does  not  match  any  predicate. 
The  image  is  scanned,  and  each  tuple  retrieved  is  evaluated 
against  all  predicates.   The  expected  cost  is  R/T  +  H  x  R  x  N, 
where  N  is  the  number  of  predicates  in  the  query. 

6)  use  of  an  image  that  does  not  match  any  predicate.   The  ex- 
pected cost  is  R  +  H  x  R  x  N. 

For  a  more  complete  discussion  on  the  processing  of  a  simple 
query,  see  (47) . 

Now  let  us  examine  some  of  the  "reasonable"  methods  for  executing 
a  query  involving  a  join  of  two  relations.  A  detailed  discussion 
on  these  methods  is  given  in  (47). 
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Again  to  be  concrete,  let  us  consider  a  SEQUEL  query  that  finds 

the  names  and  jobs  of  all  employees  who  work  on  the  third  floor 

and  earn  over  20X. 

SELECT  NAME,  JOB 
FROM  EMP,  DEFT 
WHERE   EMP. SAL  >  20 K       AND 

EMP.DNO  =  DEPT.DNC   AND 

DEPT.LCG  =  3 

1)  use  images  on  join  fields  of  both  relations  i 

a.  Scan  along  the  image  on  DEPT.DNC  to  retrieve  a  DEPT  tuple 
whose  LOC  value  is  3« 

b.  Scan  along  the  image  on  EMP.DNO  to  retrieve  all  EMP  tuples 
whose  DNC  values  match  the  current  DEPT.DNO  value  and 
whose  SAL  values  are  greater  than  20K. 

c.  Repeat  steps  a  and  b  until  the  image  on  DEPT.DNC  is  com- 
pletely scanned. 

2)  sort  both  relations  along  the  join  fields  » 

a.  Scan  EMP  using  its  clustering  image,  and  create  a  tempora- 
ry relation  Rl  with  NAME,  JOB,  and  DNO  fields  from  those 
tuples  of  EMP  whose  SAL  values  are  greater  than  20K. 

b.  Scan  DEFT  using  its  clustering  image,  and  create  a  tempora- 
ry relation  R2  with  DNC  field  from  those  tuples  of  DEPT 
whose  LOC  is  J, 

c.  Sort  Rl  and  R2  on  DNC  fields. 

d.  Apply  method  1)  above. 

3)  use  multiple  passes  : 

a.  Scan  DEPT  using  its  clustering  image,  and  insert  into  a 
main  memory  data  structure  W  the  DNC  values  of  all  DEPT 
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tuples  whose  LOC  values  are  3.   W  is  a  unary  relation  in 
our  example.   If  W  can  hold  only  N  such  tuples,  the  small- 
est N  tuples  are  placed  in  W  during  current  pass.   The 
remaining  tuples  are  inserted  into  W  during  subsequent 
passes. 

b.  Once  W  is  constructed,  scan  EMP  using  its  clustering  image. 
For  each  EMP  tuple  T  whose  SAL  value  >  20K,  T. NAIVE  and 
T.JC3  values  are  placed  in  the  output  list,  if  T.DNO  is 
found  in  W. 

c.  If  W  was  not  large  enough  to  hold  all  qualified  DEPT  tuples, 
steps  a  and  b  are  repeated. 

4)   use  the  TID  algorithm  : 

a.  Construct  a  sorted  list  Rl  of  the  tuple  identifiers  of 
the  EMP  tuples  whose  SAL  values  >  20K  using  the  image  on 
EMP. SAL. 

b.  Construct  a  sorted  list  R2  of  the  TIDs  of  the  DEPT  tuples 
whose  LOC  values  =  3  using  the  image  on  DEPT. LOC. 

c.  Perform  a  simultaneous  scan  over  the  DEPT.DNO  and  EMP.DNC 
images  to  find  the  TID  pairs  of  tuples  whose  DNO  values 
match.   If  Tl  and  T2  of  each  such  TID  pair  (Tl,  T2)  are 
in  Rl  and  R2,  respectively,  then  the  tuple  referenced  by 
Tl  is  retrieved  and  its  NAME  and  JOB  field  values  are 
placed  in  the  output  list. 

The  use  of  a  binary  link  in  processing  a  join  was  explained  ear- 
lier in  this  subsection.  Remember  that  each  of  these ' options  is 
applicable  only  if  its  requisite  access  paths  are  available. 
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SYSTEM  R  currently  relies  on  the  users  to  dynamically  create  and 
delete  all  access  paths  maintained  on  their  base  relations.   Tech- 
niques are  being  investigated  that  will  shift  some  of  these  bur- 
dens away  from  the  users. 

5.  SUPPORT  CF  USER  VIEWS 

An  individual  user's  view  of  the  database  may  be  derived  from  the 
underlying  relations  by  permuting  their  columns,  or  applying  sel- 
ection, projection,  or  join  operations.   A  view  can  be  any  deriv- 
able relation  as  far  as  supporting  retrieval  operations  is  con- 
cerned.  For  supporting  storage  operations  (insert,  delete,  up- 
date), however,  a  view  must  be  identical  to  the  underlying  base 
relations,  except  that  individual  tuples  and  non-primary  key 
fields  may  be  omitted  in  the  view.   This  restriction  is  essential 
in  propagating  the  effects  of  the  storage  operations  down  to  the 
underlying  base  relations.   Systems  that  support  user  views  in- 
clude I 3/1,  PRTV,  SEQUEL,  ZETA,  and  SYSTEM  R.   A  good  introduc- 
tion to  the  notion  of  view  is  given  in  (49)  and  Chapter  10  of 
(6).   This  section  examines  the  view  mechanisms  implemented  for 
I 3/1,  ZETA,  and  SYSTEM  R. 

5.1  13/1 

In  is/l  a  view  definition  is  translated  into  an  equivalent  string 
of  intermediate  codes  (see  section  2.2)  and  stored  away  (an  impli- 
cit view).   When  this  view  is  accessed  for  the  first  time,  its 
definition  is  retrieved  and  executed  to  yield  the  desired  rela- 
tion.  The  relation  thus  materialized  is  then  stored  as  a  new 
base  relation  (an  explicit  view),  so  that  subsequent  accesses  to 
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the  view  may  be  accomplished  without  the  overhead  necessary  in 
executing  its  definition. 

When  a  base  relation  is  updated,  Io/l  makes  all  explicit  views 
defined  on  the  relation  implicit  again,  so  that  the  effect  of 
the  update  is  reflected  in  the  views  when  they  are  recreated. 
If  loss  of  information  may  result  in  recreating  an  explicit 
view,  however,  updates  are  made  directly  on  the  explicit  view. 
The  effect  of  storage  operations  on  a  view  is  not  propagated 
to  its  underlying  base  relations.   It  is  noted  here  that  IS/l 
also  allows  views  to  be  defined  on  other  views  to  any  number 
of  levels. 

When  the  storage  space  becomes  full,  13/1  automatically  deletes 
some  of  the  explicit  views.   Explicit  views  are  selected  for 
deletion  according  to  their  usage  counts,  sizes,  whether  or  not 
they  can  be  recreated  without  loss  of  data,  and  the  estimated 
computing  effort  required  to  recreate  them. 

5.2  ZSTA 

ZETA  supports  two  types  of  views,  namely  a  snapshot  and  an  auto- 
matic derivation.   The  implementation  schemes  for  these  are  due 
to  Czarnik  (50) . 

A  snapshot  is  a  time  invariant  picture  of  a  portion  of  the  data- 
base at  a  particular  instant  of  time.   A  snapshot  does  not  ref- 
lect any  changes  made  to  its  underlying  relations,  and  has  to  be 
recreated  each  time  an  updated  version  is  required.   An  automa- 
tic derivation,  on  the  other  hand,  is  a  time-dependent  view  that 
is  updated  whenever  its  underlying  relations  are  updated. 
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A  snapshot  defined  on  a  single  relation  may  be  implemented  in 
any  of  the  following  ways: 

First,  in  a  retrieval-oriented  environment,  the  marking  scheme 
of  the  low  level  system  (see  section  2.5)  can  be  used.   An  ac- 
cess to  a  snapshot  then  involves  indirect  accesses  to  the  ap- 
propriate tuples  via  their  tuple  identifiers  in  the  mark. 
Second,  in  an  update-oriented  environment,  the  overhead  required 
in  updating  the  marks  becomes  prohibitively  high,  and  a  snapshot 
is  implemented  as  an  explicit  relation.   This  scheme  sacrifices 
storage  space  in  favor  of  an  improved  processing  time. 
Third,  a  compromise  scheme  can  be  used  to  avoid  the  excessive 
processing  time  overhead  and  the  storage  space  penalty  of  the 
above  two  methods.   A  snapshot  is  initially  implemented  as  a 
mark.   Before  a  change  is  made  to  a  tuple  of  the  underlying 
relation,  the  tuple  is  copied  and  logged  onto  a  separate  data 
area.   The  mark  entry  is  then  modified  to  point  to  the  logged 
tuple.   When  the  number  of  updates  to  the  mark  reaches  a  thresh- 
old, the  snapshot  is  converted  to  an  explicit  relation  and  the 
mark  is  destroyed. 

An  automatic  derivation  is  implemented  by  storing  its  definition 
and  executing  it  each  time  an  access  is  made  to  the  view.  The 
marking  scheme  can  also  be  used  to  implement  an  automatic  deri- 
vation defined  on  a  single  relation. 

5.3  INGRES 

INGRES  does  not  currently  support  user  views.   However,  the  query 
modification  technique  (see  section  6.2)  developed  for  the  inte- 
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grity  and  access  control  (32,  33)  can  be  used  for  the  support  of 
user  views  in  future  versions  of  the  system. 

5.4  SYSTEM  R 

SYSTEM  R  allows  definition  of  views  via  the  query  facilities  of 
the  SEQUEL  data  sublanguage.   Queries  may  be  made  against  a  view, 
and  other  views  may  be  defined  on  it.   When  a  user  destroys  a 
view,  the  view  and  all  other  views  defined  on  it  are  destroyed. 
Storage  operations  against  a  view  are  supported  only  if  the  view 
is  a  selection  and/or  a  projection  on  a  single  base  relation  and 
it  also  contains  the  primary  key  of  the  base  relation.   Effects 
of  storage  operations  against  a  view  are  propagated  down  to  its 
underlying  base  relation.   Effects  of  storage  operations  on  the 
base  relations  immediately  become  visible  to  all  views  defined 
on  them. 

SYSTEM  R  stores  a  view  definition  as  a  Pre-Optimized  Package 
(POP)  containing  its  parse  tree  and  an  optimal  execution  scheme 
to  materialize  the  data  (determined  by  the  optimizer  as  discussed 
in  section  4.3).   SYSTEM  R  performs  an  operation  against  a  view 
by  merging  the  parse  tree  for  the  qualification  portion  of  the 
operation  into  the  parse  tree  for  the  view  definition,  then  op- 
timizing the  resulting  tree,  and  finally  performing  the  indicated 
operation  on  the  relevant  tuples  of  the  base  relation.   Note  that 
if  base  relations  or  the  access  paths  maintained  on  them  are  de- 
leted, the  POPs  of  all  views  defined  on  these  base  relations  are 
invalidated,  and  must  be  recreated  from  their  defining  SEQUEL 
statements. 
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6.  INTEGRITY  AND  AUTHORIZATION  CONTROL 

It  is  essential  that  an  integrated  database  system  in  a  multi- 
user environment  prevent  users  from  accessing  and  modifying  the 
database  in  unauthorized  ways  (authorization  or  access  control) 
and  prevent  authorized  or  unauthorized  users  from  violating  the 
semantic  accuracy  of  the  database  through  their  carelessness  or 
lack  of  knowledge  (integrity  control).   However,  only  a  few  sys- 
tems are  known  to  support  both  the  integrity  and  authorization 
control  facilities.   Integrity  control  is  provided  in  QUERY-BY - 
EXAMPLE,  INGRES,  and  SYSTEM  R;  while  authorization  control  is 
provided  in  MADAM,  SEQUEL,  QUERY-BY-EXAMPLE,  and  SYSTEM  R.   This 
section  deals  with  the  authorization  mechanism  of  MADAM,  and  the 
authorization  and  integrity  control  facilities  of  INGRES  and  SYS- 
TEM R.   r'or  a  good  introductory  discussion  on  database  security 
and  integrity,  see  Chapters  23  and  24  of  (6). 

6.1  MADAM 

An  authorization  mechanism  due  to  Owens  (53)  is  implemented  for 
^•fADAM  with  the  aid  of  the  MULTIC3  protection  facilities  (54). 
The  creator  of  a  relation  is  allowed  to  grant  to  other  users 
various  types  of  read  and  write  capabilities  on  any  column  of 
the  relation.   In  increasing  order  of  accessibility,  the  read 
capabilities  that  may  be  granted  are  null  (N),  manipulate  (M), 
statistical  (3),  and  print  (P).   N  permits  no  access,  and  is  the 
system  default.   M  permits  a  user  to  perform  relational  algebra 
operations  on  the  protected  column.   The  user  may  or  may  not  be 
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allowed  to  see  the  results  of  the  operations.   S  allows  a  user 
to  perform  statistical  functions  (e.g.,  average,  sum,  count)  and 
see  the  results.   P  allows  a  user  to  read  the  protected  column. 
Each  type  of  read  capability  is  considered  to  be  a  proper  subset 
of  the  next  more  accessible  capability.   Thus,  a  user  with  the 
3  capability  also  has  the  M  capability,  and  a  user  with  the  P  ca- 
pability has  the  3  and  M  capabilities  as  well. 

The  write  capabilities  that  may  be  granted  are,  in  increasing  or- 
der of  accessibility,  null  (N),  append  (A),  write  (W),  and  change 
access  (C).   N  permits  no  access,  and  is  the  default.   A  allows 
a  user  to  append  tuples  to  the  logical  end  of  the  protected  rela- 
tion.  T,r  allows  a  user  to  update  the  column  values.   C  permits  a 
user  to  manipulate  the  authorization  information  associated  with 
the  column  being  protected. 

furthermore,  before  the  system  allows  any  of  the  above-mentioned 
read  and  write  capabilities,  any  combination  of  the  following 
tests  that  may  be  specified  by  the  creator  of  the  relation  or 
those  users  to  whom  the  G  capability  has  been  granted  must  be 
satisfied : 

1)  list  of  user  characteristics  such  as  the  user-id,  terminal-id, 
day  of  the  week,  and  so  on.   Examples  of  authorization  restric- 
tions with  this  type  of  test  may  be  "if  user-id  =  Smith  and  term- 
inal-id =  57  and  day  =  Friday,  then  access  =  M"  or  "if  user-id 

t   Jones,  then  access  =  P." 

2)  content  of  the  column  being  protected.   For  example,  "if  SAL 
=  30X,  then  access  =  N." 

3)  context  in  which  the  column  being  protected  appears.   For  ex- 
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ample,  "SAL  must  not  be  retrieved  in  conjunction  with  the  corres- 
pondinr  NAME  values"  and  "3AL  must  not  be  retrieved  in  conjunc- 
tion with  NAME  =  Jones." 

*0  password.   The  user  must  provide  an  appropriate  password  in 
response  to  a  prompt  from  the  system. 

6.2  INGRES 

The  current  version  (version  5)  of  INGRES  does  not  support  selec- 
tive authorization  of  access,  and  also  enforces  only  simple  inte- 
grity assertions  that  involve  single  range  variables  and  no  aggre^ 
gate  functions.   So-called  query  modification  techniques  have 
been  developed,  however,  to  support  user  views,  selective  authori 
zation,  and  more  general  integrity  assertions,  and  they  are  cur- 
rently being  incorporated  into  the  system. 

Before  discussing  the  query  modification  techniques  in  detail, 
the  limited  database  control  features  implemented  in  the  current 
system  are  examined.   First  of  all,  INGRES  supports  a  set  of 
Databases.   Each  Database  has  a  special  user  called  the  Database 
Administrator  (DEA).   Each  Database  consists  of  a  set  of  rela- 
tions created  and  owned  by  the  DBA,  sets  of  relations  created 
and  owned  by  other  users  authorized  to  interact  with  the  Data- 
base, and  a  set  of  catalog  relations  containing  information 
about  all  other  relations  in  the  Database,  their  columns,  and 
the  secondary  indices  maintained  on  them.   The  relations  created 
by  the  DBA  are  shared  among  all  users  in  the  Database;  however, 
the  relations  created  by  other  users  are  not  shared.   The  DBA 
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can  destroy  any  relations  created  by  any  other  users. 

Let  us  now  see  how  the  proposed  query  modification  techniques 

can  be  uniformly  applied  to  implement  the  views,  authorization, 

and  general  integrity  assertions. 

Smith  defines  a  view  VEMP  on  EMP  as  follows : 

RANGE  OF  E  13  EKP 

DEFINE  VEEP  (E.NAKE,  E.DNO,  E.3AL) 

WHERE   E.SAL  > 20 K 

An  integrity  constraint  is  in  force  on  EMP: 

RANGE  0?  E  IS  EMP 

INTEGRITY  CONSTRAINT  IS   E.SAL  <  30K 

Smith  is  given  the  authority  to  update  salaries  of  employees 

he  manages : 

RANGE  OF  E  IS  EMP 

PROTECT  EMP  FOR  ALL  (E.3AL;  E.NAME,  E.DNO) 

WHERE  E.MGR  =  Smith 

( Smith  is  authorized  only  to  read  the  names  and  department 

numbers  of  his  employees) 

Now  suppose  that  Smith  wishes  to  give  Jones  a  20  percent  pay 

raise  by  issuing  the  following  QUEL  statement: 

RANGE  OF  S  IS  VEMP 

REPLACE  E  (SAL  =  1.2  *  S.SAL) 

WHERE    E.NAME  =  Jones 

Since  this  query  is  against  a  view,  it  is  first  modified  as 

follows : 

RANGE  OF  E  IS  EMP 
REPLACE  E  (SAL  =  1.2  *  E.SAL) 
WHERE    E.NAME  =  Jones   AND 
E.SAL  >  20K 

Note  that  the  predicates  of  the  query  and  the  view  definition 

are  merged,  and  also  that  the  range  variable  E  now  ranges  over 

the  base  relation  EMP. 
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^he  integrity  constraint  is  next  merged  into  this  modified  query 
to  yield  the  following: 

RANGE  CF  E  IS  EMP 

REPLACE   E  (SAL  =  1.2  *  E.3AL) 

WHERE    E.NAME  =  Jones   AND 

E.3AL  >  20 K  AND 

1.2    *   E.3AL  -c  30K 

Finally,  the  authorization  restriction  is  merged  into  the  query, 

and  the  resulting  query  is  ready  to  be  passed  down  to  process  3» 

RANGE  CF  E  13  EM? 

REPLACE   E  (3AL  =  1.2  *  E.3AL) 

WHERE    E.NAKE  =  Jones   AND 

E.3AL   ">  20K  AND 

1.2   *   E.SAL  <  30K     AND 

E.MGR  =   Smith 

6.3    SYSTEM  R 

SYSTEM  R  supports  extensive  data  control  facilities,  including 
authorization,  integrity  assertions,  transactions,  and  triggers. 

1)  authorization 

SYSTEM  R  does  not  provide  for  a  centralized  database  administra- 
tor.  Each  user  creates  and  deletes  his  own  base  relations,  and 
defines  and  destroys  views  on  them.   A  user  also  grants  and  re- 
vokes selected  privileges  on  his  views  and  base  relations  to 
other  users.   These  privileges  may  be  any  combinations  of  the 
following:   read,  insert  tuples,  delete  tuples,  update  selected 
fields  of  tuples;  specify  access  paths,  triggers,  and  integrity 
assertions;  and  grant  or  revoke  any  of  these  privileges  to  still 
other  users.   A  detailed  account  of  the  SYSTEM  R  authorization 
mechanism  is  given  in  (51). 
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The  nested  granting  of  privileges  and  their  subsequent  revocation 
complicate  the  implementation  of  the  authorization  mechanism. 
When  a  user  creates  a  base  relation,  he  is  fully  authorized  to 
perform  any  operation  on  it,  and  also  to  grant  any  or  all  privi- 
leges on  it  to  any  other  user.   The  update  privilege  is  held  on 
each  column  of  the  base  relation.   The  privileges  that  any  user 
nolds  on  a  base  relation  are  either  grantable  or  not  grantable. 
The  set  of  privileges  that  the  user  may  grant  to  other  users  is 
the  union  of  all  grantable  privileges  he  holds  on  the  relation. 
The  set  of  privileges  that  the  user  holds  on  the  relation,  on 
the  other  hand,  is  the  union  of  all  grantable  and  non-grantable 
privileges  he  holds  on  the  relation.   A  user  may  revoke  only 
those  privileges  he  previously  granted.   When  a  user  A  revokes 
a  grantable  privilege  P  from  another  user  B,  P  is  withdrawn  from 
user  B  and  all  other  users  to  whom  user  B  may  have  granted  P. 
Note  that  even  when  user  A  revokes  P  from  user  B,  user  B  may 
still  retain  ?  if  another  user  C  previously  granted  P  to  user  B. 
Consider  the  following  situation,  assuming  that  Smith  is  the  cre- 
ator of  the  EMP  relation: 

1 .  Smith  grants  to  Jones  the  read  and  insert  privileges  on 
EMP  with  the  grant  option. 

2.  Smith  grants  to  Brown  the  read  privilege  on  EMP  with  the 
grant  option. 

3.  Jones  grants  to  Brown  the  insert  privilege  on  EMP. 

Erown  is  then  authorized  to  insert  tuples  into  the  EMP  relation, 
make  queries  against  the  relation,  and  grant  the  read  privilege 
to  other  users.   Note  also  that  the  read  privilege  permits  Erown 
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to  define  views  on  the  EMP  relation. 
Now  if  Smith  revokes  all  privileges  from  Brown,  Brown  loses  the 
read  privilege  and  the  privilege  to  grant  the  read  privilege,  but 
retains  the  insert  privilege  that  Jones  granted  to  him.   If,  on 
the  other  hand,  Smith  revokes  the  grant  privilege  from  Jones, 
brown  loses  the  insert  privilege  as  a  result. 

When  a  user  defines  a  view,  his  privileges  on  it  are  restricted 
to  only  those  privileges  that  he  holds  on  the  underlying  views 
and  base  relations.   If  the  view  involves  more  than  one  relation, 
the  user's  privileges  are  defined  as  the  intersection  of  the  pri- 
vileges held  on  all  underlying  relations.   The  user  may  grant  a 
orivilege  ?  on  the  view  if  and  only  if  he  holds  a  grantable  privi- 
lege P  on  all  underlying  relations.   The  update  privilege  is  held 
on  a  column  of  the  view  only  if  it  is  held  on  that  column  of  all 
underlying  relations  in  which  the  column  appears. 

2)  integrity  assertions 

SYSTEM  R  supports  integrity  assertions  that  describe  either  valid 
states  (state  assertions)  or  valid  state  transitions  (transition 
assertions).   For  example,  the  assertion,  "No  employee  should 
earn  less  than  10K, "  is  a  description  for  a  valid  state;  while 
the  assertion,  "No  employee  should  be  given  a  pay  cut, "  constrains 
a  state  transition. 

SYSTEM  ?.   allows  both  state  and  transition  assertions  to  be  imposed 
on  individual  tuples  of  a  relation  or  on  sets  of  tuples.   For  ex- 
ample, "The  average  salary  of  employees  of  any  department  must 
not  exceed  15K"  is  a  state  assertion  applied  to  a  set  of  tuples, 
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while  ,,rvhe   average  salary  of  employees  of  any  department  must 
not  rise  by  more  than  10  percent  a  year"  is  a  transition  asser- 
tion on  a  set  of  tuples. 

3Y3TE?"  P,  furthermore,  allows  the  user  to  specify  a  sequence  of 
SEQUEL  statements  as  a  transaction,  and  checks  for  and  enforces 
integrity  constraints  at  the  end  of  each  transaction  by  default, 
so  that  consistency  constraints  may  be  applied  to  a  sequence  of 
complex  storage  operations  which  may  pass  through  states  that 
temporarily  violate  these  constraints.   For  example,  in  transfer- 
rins money  from  Smith's  savings  account  to  checking  account,  the 
consistency  constraint  that  Smith  maintain  a  constant  total  bal- 
ance is  temporarily  violated  when  the  savings  account  is  debited 
but  the  checking  account  is  not  yet  credited.   State  assertions 
apply  to  the  final  state  of  the  database,  while  transition  asser- 
tions compare  the  states  of  the  database  before  and  after  the 
transaction. 

When  integrity  assertions  are  declared  to  be  "immediate,"  however, 
they  are  enforced  immediately,  even  on  intermediate  stages  of  a 
transaction.   An  immediate  assertion  on  a  tuple  is  enforced  when- 
ever a  tuple  is  updated,  while  an  immediate  assertion  on  a  set 
of  tuples  is  enforced  after  each  SEQUEL  update  statement  is  exe- 
cuted.  SYSTEM  3  allows  the  user  to  specify  save  points  within  a 
transaction  and  back  up  either  to  the  beginning  of  the  transac- 
tion or  to  any  save  point  by  undoing  all  changes  made  to  the  data- 
base by  the  transaction.   The  user  may  also  establish  integrity 
points  (similar  to  the  transaction  save  points)  within  a  long 
transaction  in  order  to  have  the  transaction  backed  out  only  to 
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its  most  recent  integrity  point.   Integrity  constraints  are  en- 
forced at  each  integrity  point  within  a  transaction. 
Finally,  integrity  assertions  are  dynamically  created  and  deleted 
by  the  user.   The  SYSTEM  R  approach  to  integrity  control  is  des- 
cribed in  detail  in  (52). 

3)  Triggers 

A  trigger  is  a  sequence  of  SEQUEL  statements  that  is  automatical- 
ly executed  when  a  pre-specified  condition  is  satisfied.   For  ex- 
ample, whenever  an  EMP  tuple  is  inserted,  or  deleted,  or  its  DNC 
field  is  modified,  the  NEMP3  field  of  an  appropriate  DEFT  tuple 
(EMP.DNO  =  DEPT.DNO)  must  be  modified  to  reflect  the  change  in 
the  EM?  relation.   The  following  trigger,  named  EMPIN3ERT,  auto- 
matically increments  the  NEMPS  field  of  a  DEPT  tuple  when  a  new 
EMP  tuple  is  inserted: 

DEFINE  TRIGGER  EMPIN3ERT 
CM  INSERT  OF  EMP t 
(UPDATE  DEPT 

SET    NEMPS  =  NEMPS  +  1 

WHERE   DNC  =  NEW  EMP.DNO) 

Triggers  EMFDELETE  and  SMPUPDATE  can  also  be  easily  defined  for 
execution  when  an  EMP  tuple  is  deleted,  and  when  the  DNO  field 
of  an  EMP  tuple  is  modified,  respectively. 
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7.  CONCURRENCY  CONTROL 

Systems  that  support  concurrent  updates  on  a  shared  database  in- 
clude GMI3,  INGRES,  and  SYSTEM  R.   This  section  discusses  the 
concurrency  control  implemented  for  INGRES  and  SYSTEM  R. 

7.1  INGRES 

The  current  version  of  INGRES  provides  only  a  crude  concurrency 
control.   The  unit  of  data  consistency  is  one  QUEL  statement. 
INGRES  uses  physical  locks  on  columns  of  relations,  and  avoids 
deadlocks  by  not  allowing  an  interaction  to  pass  down  to  process 
3  until  it  can  lock  all  required  resources.   When  a  lock  request 
is  made,  the  LOCK  catalog  is  first  physically  locked,  and  then 
checked  for  the  presence  of  the  lock.   If  the  lock  is  not  in  the 
catalog,  it  is  placed  in  the  catalog,  and  the  lock  request  is 
satisfied.   If  the  lock  is  already  in  the  catalog  (i.e.,  the  lock 
is  held  by  another  concurrent  user),  the  concurrency  processor 
of  process  2  waits  for  a  fixed  interval,  and  then  tries  again. 
The  concurrency  processor  deletes  appropriate  locks  from  the  cat- 
alog only  when  it  receives  a  termination  message  from  process  3» 

7.2  SYSTEM  R 

SYSTEM  P.  implements  locking  techniques  at  the  logical  level  of 
relations  and  tuples  as  well  as  at  the  physical  level  of  pages. 
Locking:  at  the  logical  level  is  implemented  to  support  isolated 
backout  of  a  transaction  and  retrieval  of  "clean"  data  as  well 
as  synchronization  of  concurrent  updates.   The  isolated  backout 
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of  a  transaction  refers  to  the  data  consistency  constraint  that 
the  backout  of  a  transaction's  updates  must  not  undo  another 
concurrent  transaction's  updates.   A  data  item  remains  "dirty" 
as  lonr  as  the  transaction  within  which  it  has  been  updated  may 
be  backed  out;  it  becomes  "clean"  only  when  the  transaction  goes 
to  completion.   The  classic  "lost  update"  problem  is  also  resolved 
by  the  logical  level  locking. 

v.'hen  the  system  locks  relations,  tuples,  or  index  values  at  the 
logical  level,  it  must  also  acquire  locks  at  the  physical  level 
of  data  or  index  pages  to  insure  accurate  results.   For  example, 
transaction  Tl  makes  an  access  to  the  EMP  relation,  and  transac- 
tion m2   to  the  DEPT  relation.   Suppose  tuples  from  the  two  rela- 
tions are  stored  in  the  same  page.   Tl  may  obtain  the  byte  address 
of  an  SMP  tuple  from  a  header  area  slot;  and  before  Tl  retrieves 
the  tuple,  T2  may  retrieve  a  DEPT  tuple  and  invoke  a  data  compac- 
tion routine  to  reassign  tuple  addresses  in  the  page.   The  tuple 
address  that  Tl  previously  retrieved  may  not  point  to  the  correct 
tuple  any  longer.   The  logical  locks  set  on  the  two  relations  thus 
do  not  guarantee  accurate  results. 

Physical  locks  are  held  for  the  duration  of  a  single  SEQUEL  state- 
ment execution,  while  logical  locks  are  held  until  they  are  expli- 
citly released  or  to  the  end  of  a  transaction.  Locks  may  either 
be  shared  or  exclusive.  If  a  transaction  is  only  to  read  a  data 
item,  it  sets  a  shared  lock  on  the  data  item  and  allows  other  con- 
current transactions  to  read  the  same  data  item.  If  a  transaction 
holds  an  exclusive  lock  on  a  data  item,  no  other  concurrent  trans- 
actions are  allowed  to  read  or  update  the  data  item. 
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SYSTEM  R  locks  data  objects  at  various  granularities  so  that  ap- 
plications accessing  different  amounts  of  data  are  run  efficient- 
ly.  Locks  can  be  set  on  individual  tuples,  or  an  entire  relation, 
or  even  an  entire  segment.   For  a  transaction  requiring  an  access 
to  a  small  amount  of  data,  it  would  be  reasonable  to  lock  individ- 
ual tuples;  while  it  may  be  more  efficient  to  lock  entire  rela- 
tions, if  a  transaction  accesses  a  very  large  amount  of  data. 
Unlike  INGRES,  which  avoids  deadlocks,  SYSTEM  R  is  designed  to 
periodically  check  for  deadlock  situations,  and  recover  from 
deadlocks  by  backing  out  one  or  more  transactions  on  the  basis 
of  their  relative  ages  and  on  the  duration  of  their  locks. 
SYSTEM  R  implements  three  levels  of  consistency  in  isolating 
each  transaction  from  the  effects  of  other  concurrent  transac- 
tions.  At  any  level  of  consistency  the  isolated  backout  of  a 
transaction  is  guaranteed.   The  consistency  levels  differ  in 
the  degree  of  isolation  of  each  transaction  from  other  concur- 
rent transactions,  and  in  the  processing  overhead. 
A  transaction  at  level  1  consistency  may  read  "dirty"  data, 
and  different  values  for  the  same  data  during  the  same  trans- 
action.  Although  it  offers  the  least  degree  of  isolation  of 
a  transaction  from  others,  this  level  incurs  the  least  proces- 
sing overhead,  and  may  be  entirely  satisfactory  when  exact 
results  are  not  required.  At  this  level  no  logical  locks  are 
required  for  a  read  operation. 

At  level  2  consistency,  a  transaction  reads  only  "clean"  data? 
however,  it  may  still  read  different  values  for  the  same  data 
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durinr  the  same  transaction,  since  the  data  may  be  updated  by 
another  transaction  and  become  "clean"  before  the  first  trans- 
action goes  to  completion.   If  a  tuple  is  read,  a  shared  lock 
must  be  held  on  the  tuple  for  the  duration  of  the  read  opera- 
tion. 

At  level  3  consistency,  the  user  sees  the  logical  equivalent  of 
a  sinrle  user  system.   Each  transaction  reads  only  "clean"  data, 
and  the  same  values  for  the  same  data  during  the  same  transac- 
tion.  Although  it  guarantees  complete  isolation  of  a  transac- 
tion from  other  concurrent  transactions,  this  level  incurs  the 
highest  processing  overhead.   If  a  tuple  is  read  at  this  level 
of  consistency,  a  shared  lock  must  be  held  on  the  tuple  until 
the  end  of  the  transaction. 

It  is  noted  that  if  a  tuple  is  updated,  an  exclusive  lock  must 
be  held  on  the  tuple  for  the  duration  of  the  transaction  for 
all  three  levels  of  consistency. 

The  SYSTEM  P.  approach  to  concurrency  control  is  given  in  (37 » 
38). 

8.  SYSTEM  CHECKPOINT  AMD  RECOVERY 

^ost  of  the  early  prototype  relational  database  systems  do  not 
provide  functions  to  restore  the  database  to  a  consistent  state 
in  the  event  of  a  system  crash.   In  this  section  the  system 
checkpoint  and  recovery  schemes  implemented  in  INGRES  and  SYSTEM 
?.  are  presented. 

8.1  INGRES 


59 


Recall  that  for  an  update  operation  all  qualifying  tuples  are 
isolated  and  spooled  onto  a  temporary  file  in  process  3  and 
the  actual  update  of  the  base  relations  and  the  secondary  in- 
dices maintained  on  them  is  performed  in  process  b.    INGRES  ac- 
complishes recovery  from  system  crashes  that  leave  the  disk 
intact  by  either  destroying  the  temporary  deferred  update  file 
if  process  4  has  not  started  processing  it,  or  by  allowing  the 
deferred  update  file  to  be  completely  processed  if  process  k 
has  already  started  processing  it. 

To  recover  from  system  crashes  that  destroy  the  disk  files, 
the  INGRES  "superuser "  logs  all  database  interactions,  and 
checkpoints  the  database  onto  tape  using  the  UNIX  backup  scheme. 
Recovery  is  then  accomplished  by  restoring  the  last  checkpoint 
and  running  the  log  of  interactions. 

8.2  3Y3TEP4  R 

The  SYSTEM  R  scheme  for  recovery  from  system  crashes  that  leave 
the  disk  storage  intact  (i.e.,  only  the  contents  of  main  memory 
are  destroyed)  and  also  from  those  that  destroy  the  disk  storage 
is  based  on  the  use  of  transaction  logs  and  a  set  of  segment  re- 
covery functions. 

The  set  of  segment  recovery  functions  includes  the  RSI  operators 
OPEN_SEGMENT,  3AVE__3EGMENT ,  and  RE3TCRE_3EGMENT .   To  efficiently 
support  these  functions,  two  page  maps,  called  current  and  backup, 
are  associated  with  each  recoverable  segment.   When  a  segment  is 
first  opened  by  the  0PEN_3EGMENT  operator,  these  two  page  maps 
contain  identical  entries.   V/hen  a  page  of  the  segment  is  access- 
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ed  for  an  update  operation  for  the  first  time  since  the  segment 
was  opened  or  was  last  saved  (see  below),  the  page  is  fetched 
into  a  main  memory  buffer,  a  new  page  slot  is  allocated  on  the 
disk,  and  the  appropriate  entry  in  the  current  page  map  is  mod- 
ified to  reference  the  new  page  slot.   The  page  is  then  updated 
in  its  main  memory  buffer,  and  written  out  on  the  new  page  slot. 
The  5AVE__3EGMENT  operator  causes  all  pages  of  a  segment  that 
have  been  updated  in  their  main  memory  buffers  to  be  written 
onto  their  newly  allocated  page  slots.   All  old  page  slots  for 
those  pages  that  have  been  updated  since  the  last  segment  save 
point  are  released,  and  the  backup  page  map  entries  are  set 
equal  to  their  corresponding  current  page  map  entries.   The 
P.E3TCRS_3SGMENT  operator  causes  the  current  page  map  entries 
to  be  set  equal  to  their  corresponding  backup  page  map  entries 
and  all  newly  allocated  page  slots  to  be  released,  since  the 
backup  page  map  points  to  a  consistent  copy  of  the  segment. 
lecrments  dedicated  for  user  relations  and  segments  intended 
for  the  logs  of  transactions  are  both  recovered  via  these  re- 
covery functions. 

For  recovery  from  disk- intact  system  crashes,  the  system  fre- 
quently checkpoints  segments  containing  user  relations.   The 
transaction  log  segments  are  checkpointed  at  the  end  of  each 
transaction,  and  contains  both  the  old  and  new  values  of  all 
updated  data  objects. 

For  recovery  from  non-disk-intact  system  crashes,  the  system 
not  only  checkpoints  data  segments  and  transaction  log  seg- 
ments, but  also  copies  the  saved  pages  from  disk  to  tape. 
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